Configure Parameters for a Specific Test

This notebook guides model developers through using a simple classification model for bank customer churn dataset. It shows you how to set up the ValidMind Developer Framework and guide you through documenting a model using the ValidMind Developer framework. It shows how user can configure parameters for a test or set of tests in a specific section of document. For this simple demonstration, we use a bank customer churn dataset from Kaggle (https://www.kaggle.com/code/kmalit/bank-customer-churn-prediction/data).

We will train a sample model and demonstrate the following documentation functionalities:

  • Initializing the ValidMind Developer Framework
  • Using a sample datasets provided by the library to train a simple classification model
  • Configure a set of tests’ parameters to generate document about the data and model

ValidMind at a glance

We offer a platform for managing model risk, including risk associated with AI and statistical models. As a model developer, you use the ValidMind Developer Framework to automate documentation and validation tests, and then use the ValidMind AI Risk Platform UI to collaborate on model documentation. Together, these products simplify model risk management, facilitate compliance with regulations and institutional standards, and enhance collaboration between yourself and model validators.

If this is your first time trying out ValidMind:

Get started — The basics, including key concepts, and how our products work

Get started with the ValidMind Developer Framework — The path for developers, more code samples, and our developer reference

Before you begin

New to ValidMind?

For access to all features available in this notebook, create a free ValidMind account.

Signing up is FREE — Sign up now

If you encounter errors due to missing modules in your Python environment, install the modules with pip install, and then re-run the notebook. For more help, refer to Installing Python Modules.

Install ValidMind Developer Framework

%pip install -q validmind
WARNING: You are using pip version 22.0.3; however, version 24.0 is available.
You should consider upgrading via the '/Users/andres/code/validmind-sdk/.venv/bin/python3 -m pip install --upgrade pip' command.
Note: you may need to restart the kernel to use updated packages.

Initializing the Python environment

import xgboost as xgb

%matplotlib inline

Initialize the client library

ValidMind generates a unique code snippet for each registered model to connect with your developer environment. You initialize the client library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.

Get your code snippet:

  1. In a browser, log into the Platform UI.

  2. In the left sidebar, navigate to Model Inventory and click + Register new model.

  3. Enter the model details, making sure to select Binary classification as the template and Marketing/Sales - Attrition/Churn Management as the use case, and click Continue. (Need more help?)

  4. Go to Getting Started and click Copy snippet to clipboard.

Next, replace this placeholder with your own code snippet:

# Replace with your code snippet

import validmind as vm

vm.init(
    api_host="https://api.prod.validmind.ai/api/v1/tracking",
    api_key="...",
    api_secret="...",
    project="..."
)
2024-04-10 17:29:52,094 - INFO(validmind.api_client): Connected to ValidMind. Project: [Int. Tests] Customer Churn - Initial Validation (cltnl29bz00051omgwepjgu1r)

Preview the model’s documentation template

All models are assigned a documentation template when registered. The template defines a list of sections that are used to document the model. Each section can contain any number of rich text and test driven blocks that populate the documentation. Test driven blocks are populated by running tests against the model.

We can preview the model documentation template for this model by running the following code:

vm.preview_template()

Load the demo dataset

For the purpose of this demonstration, we will use a sample dataset provided by the ValidMind library.

# Import the sample dataset from the library
from validmind.datasets.classification import customer_churn as demo_dataset
# You can try a different dataset with:
# from validmind.datasets.classification import taiwan_credit as demo_dataset

raw_df = demo_dataset.load_data()

Documenting the model

We will need to preprocess the dataset and produce the training, test and validation splits first.

Prepocess the raw dataset

For demonstration purposes, we simplified the preprocessing using demo_dataset.preprocess which executes the following operations:

train_df, validation_df, test_df = demo_dataset.preprocess(raw_df)

x_train = train_df.drop(demo_dataset.target_column, axis=1)
y_train = train_df[demo_dataset.target_column]
x_val = validation_df.drop(demo_dataset.target_column, axis=1)
y_val = validation_df[demo_dataset.target_column]

model = xgb.XGBClassifier(early_stopping_rounds=10)
model.set_params(
    eval_metric=["error", "logloss", "auc"],
)
model.fit(
    x_train,
    y_train,
    eval_set=[(x_val, y_val)],
    verbose=False,
)
XGBClassifier(base_score=None, booster=None, callbacks=None,
              colsample_bylevel=None, colsample_bynode=None,
              colsample_bytree=None, early_stopping_rounds=10,
              enable_categorical=False, eval_metric=['error', 'logloss', 'auc'],
              feature_types=None, gamma=None, gpu_id=None, grow_policy=None,
              importance_type=None, interaction_constraints=None,
              learning_rate=None, max_bin=None, max_cat_threshold=None,
              max_cat_to_onehot=None, max_delta_step=None, max_depth=None,
              max_leaves=None, min_child_weight=None, missing=nan,
              monotone_constraints=None, n_estimators=100, n_jobs=None,
              num_parallel_tree=None, predictor=None, random_state=None, ...)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Initialize the ValidMind datasets

Before you can run tests, you must first initialize a ValidMind dataset object using the init_dataset function from the ValidMind (vm) module.

This function takes a number of arguments:

  • dataset — the raw dataset that you want to provide as input to tests
  • input_id - a unique identifier that allows tracking what inputs are used when running each individual test
  • target_column — a required argument if tests require access to true values. This is the name of the target column in the dataset
  • class_labels — an optional value to map predicted classes to class labels

With all datasets ready, you can now initialize the raw, training and test datasets (raw_df, train_df and test_df) created earlier into their own dataset objects using vm.init_dataset():

vm_dataset = vm.init_dataset(
    dataset=raw_df,
    input_id="raw_dataset",
    target_column=demo_dataset.target_column,
    class_labels=demo_dataset.class_labels
)

vm_train_ds = vm.init_dataset(
    dataset=train_df,
    input_id="train_dataset",
    target_column=demo_dataset.target_column
)

vm_test_ds = vm.init_dataset(
    dataset=test_df,
    input_id="test_dataset",
    target_column=demo_dataset.target_column
)
2024-04-10 17:29:53,408 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
2024-04-10 17:29:53,632 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
2024-04-10 17:29:53,678 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...

We also initialize a model object using vm.init_model():

vm_model = vm.init_model(
    model,
    input_id="model"
)

Assign predictions to the datasets

We can now use the assign_predictions() method from the Dataset object to link existing predictions to any model. If no prediction values are passed, the method will compute predictions automatically:

vm_train_ds.assign_predictions(
    model=vm_model,
    prediction_values=list(model.predict(x_train))
)
vm_test_ds.assign_predictions(
    model=vm_model,
    prediction_values=list(model.predict(x_val))
)

Run the template documentation suite

We are now ready to run the model’s documentation tests as defined in its template. The following function runs every test in the template and sends all documentation artifacts to the ValidMind platform.

full_suite = vm.run_documentation_tests(
    inputs={
        "dataset": vm_train_ds,
        "datasets": (vm_train_ds, vm_test_ds),
        "model": vm_model
    },
    section="model_diagnosis",
)

Configuration of parameters for model diagnosis tests

Each test has its default parameters and their values depending on the use case you are trying to solve. ValidMind’s developer framework exposes these parameters at the user level so that they can be adjusted based on requirements.

The config can be applied to a specific test to override the default configuration parameters.

The format of the config is:

config = {
    "<test1_id>": {
        "<default_param_1>": value,
        "<default_param_2>": value,
    },
     "<test2_id>": {
        "<default_param_1>": value,
        "<default_param_2>": value,
    },
}

Users can input the configuration to run_documentation_tests() and run_test_suite() using config, allowing fine-tuning the suite according to the specific configuration requirements.

config = {
    "validmind.model_validation.sklearn.OverfitDiagnosis": {
        "params": {
            "cut_off_percentage": 3,
            "feature_columns": ["Age", "Balance", "Tenure", "NumOfProducts"]
        },
    },
    "validmind.model_validation.sklearn.WeakspotsDiagnosis": {
        "params": {
            "features_columns": ["Age", "Balance"],
            "accuracy_gap_threshold": 85,
        },
    },
    "validmind.model_validation.sklearn.RobustnessDiagnosis": {
        "params": {
            "features_columns": ["Balance", "Tenure"],
            "scaling_factor_std_dev_list": [0.0, 0.1, 0.2, 0.3, 0.4, 0.5],
            "accuracy_decay_threshold": 4,
        },
    }
}

full_suite = vm.run_documentation_tests(
    inputs={
        "dataset": vm_train_ds,
        "datasets": (vm_train_ds, vm_test_ds),
        "model":vm_model
    },
    section="model_diagnosis",
)

Next steps

You can look at the results of this test plan right in the notebook where you ran the code, as you would expect. But there is a better way: view the test results as part of your model documentation right in the ValidMind Platform UI:

  1. In the Platform UI, go to the Documentation page for the model you registered earlier.

  2. Expand Model Development

What you can see now is a more easily consumable version of the model diagnosis tests you just performed, along with other parts of your model documentation that still need to be completed.

If you want to learn more about where you are in the model documentation process, take a look at How do I use the framework? .